Exploring Grapheme-to-Phoneme Induction with Machine Learning

نویسندگان

  • Terrence Szymanski
  • Kevin Wilson
چکیده

Text-to-speech (TTS) systems have increasingly found use in the modern world. One of the subproblems of TTS is determining the phonetic structure of words, i.e., their pronunciation, from their orthography, i.e., their spelling. This is known as the grapheme-to-phoneme (G2P) problem. In all languages this is a nontrivial task, but particularly in English, a language with rich historiolinguistics that has led to an irregular and inconsistent spelling system. A single letter, even appearing in similar contexts, can be pronounced several different ways. For example, see Table 1. The most common solution to the obstacle of unpredictable pronunciations is using a phonetic dictionary with entries that look like the rows in Table 1. However, neologisms like Google in English, Wikcionario in Spanish, and Klimakatastrophe in German constantly creep into the lexicon. Keeping track of all these new words is impossible, yet native speakers can easily pronounce most neologisms at first sight. This suggests that their oththography carries enough information to determine their pronunciation. The first solutions to the G2P problem involved hand-writing sets of pronunciation rules for various languages. With the onset of the machine learning (ML) paradigm, this method became extremely outmoded.[4] In the ML version, the basic idea is to develop a set of rules which map strings of graphemes to strings of phonemes based on their orthographic context, i.e., the letters surrounding the grapheme. These rules are learned from a phonetic dictionary like [2] which can then be applied to new words outside of the training set.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Machine Learning Based English-to-Korean Transliteration Using Grapheme and Phoneme Information

Machine transliteration is an automatic method to generate characters or words in one alphabetical system for the corresponding characters in another alphabetical system. Machine transliteration can play an important role in natural language application such as information retrieval and machine translation, especially for handling proper nouns and technical terms. The previous works focus on ei...

متن کامل

Phoneme-to-grapheme conversion for out-of-vocabulary words in speech recognition

In this report, we show that Out-Of-Vocabulary items (OOVs), recognized using phoneme recognition, can be reasonably reliably transcribed orthographically using Machine Learning techniques. More specifically, (i) we show baseline performance of a machine learning approach to phoneme-to-grapheme conversion when different levels of artificial noise are added (simulating phoneme recognizer errors)...

متن کامل

Phoneme-to-grapheme Conversion for Out-of-vocabulary Words in Large Vocabulary Speech Recognition

In this paper, we describe a method to enhance the readability of the textual output in a large vocabulary continuous speech recognition system when out-of-vocabulary words occur. The basic idea is to replace uncertain words in the transcriptions with a phoneme recognition result that is postprocessed using a phoneme-to-grapheme converter. This converter turns phoneme strings into grapheme stri...

متن کامل

Optimizing phoneme-to-grapheme conversion for out-of-vocabulary words in speech recognition

In this report, we present the results of further research on phoneme-to-grapheme (P2G) conversion for Out-Of-Vocabulary items (OOVs), recognized using phoneme recognition, in large vocabulary speech recognition. First, we summarize the results of previous research, and then we start with reporting on several optimization strategies for the Machine Learning technique we used to carry out P2G co...

متن کامل

The efficient generation of pronunciation dictionaries: machine learning factors during bootstrapping

Several factors affect the efficiency of bootstrapping approaches to the generation of pronunciation dictionaries. We focus on factors related to the underlying rule-extraction algorithms, and demonstrate variants of the Dynamically Expanding Context algorithm, which are beneficial for this application. In particular, we show that continuous updating of the learned rules, coupled with a new app...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007